Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added resource pool wait time histogram metrics #5727

Conversation

lcabancla
Copy link
Contributor

@lcabancla lcabancla commented Jan 16, 2020

Currently, the vttablet currently only exposes resource pool wait count and aggregated wait times. Adding timing histogram metrics to the resource pool would provide information on the distribution of the wait time latencies. This is specially useful to in determining tail latencies.

Example output from /debug/vars (look for TransactionPoolResourceWaitTime):

"Waits": {"TotalCount":186,"TotalTime":1559667140,"Histograms":{"Consolidations":{"500000":51,"1000000":96,"5000000":159,"10000000":163,"50000000":163,"100000000":163,"500000000":163,"1000000000":163,"5000000000":163,"10000000000":163,"inf":163,"Count":163,"Time":200226919},"TransactionPoolResourceWaitTime":{"500000":0,"1000000":0,"5000000":0,"10000000":1,"50000000":11,"100000000":20,"500000000":23,"1000000000":23,"5000000000":23,"10000000000":23,"inf":23,"Count":23,"Time":1359440221}}}

@lcabancla lcabancla requested a review from sougou as a code owner January 16, 2020 22:17
@lcabancla lcabancla force-pushed the lcabancla.resource_pool.waittime.percentiles_3 branch 4 times, most recently from e291519 to b891a76 Compare January 17, 2020 19:50
@lcabancla lcabancla changed the title WIP: Added resource pool wait time histogram metrics Added resource pool wait time histogram metrics Jan 17, 2020
@@ -325,6 +331,9 @@ func (rp *ResourcePool) SetCapacity(capacity int) error {
func (rp *ResourcePool) recordWait(start time.Time) {
rp.waitCount.Add(1)
rp.waitTime.Add(time.Since(start))
if rp.name != "" {
rp.waitStats.Record(rp.name+"ResourceWaitTime", start)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the name get added automatically farther up the stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As far as I know it is only used to publish stats.

Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend passing in a logWait func(elapsed time.Dration) instead. This will allow the name handling to live with the caller, which will be more readable.

Also, is this a production use case? I'm asking because there are other changes that are saying that we are over-reporting, and asking for more consolidations.

@lcabancla
Copy link
Contributor Author

lcabancla commented Jan 24, 2020

@sougou
That sounds good. Will look into it.
Yes, this is a production use case. Would you mind citing the other changes so I could get some context?

@sougou
Copy link
Contributor

sougou commented Feb 3, 2020

Yes, this is a production use case. Would you mind citing the other changes so I could get some context?

#5649 is one. I also know that vtgate over-reports on calls to vttablets. It's an issue if you have too many shards. I'm about to publish a proposal to help solve this problem. RFC will be out soon.

@sougou
Copy link
Contributor

sougou commented Feb 12, 2020

@lcabancla see #5809. I think it alleviates the problem enough that this can be pushed through. Can you address the review comment?

@lcabancla
Copy link
Contributor Author

@sougou Yes, will work on the comment and update this. Thanks.

@lcabancla lcabancla force-pushed the lcabancla.resource_pool.waittime.percentiles_3 branch 4 times, most recently from d758fd1 to 5add09a Compare February 13, 2020 21:58
@lcabancla lcabancla force-pushed the lcabancla.resource_pool.waittime.percentiles_3 branch from 5add09a to 5d172c4 Compare February 13, 2020 22:01
@lcabancla
Copy link
Contributor Author

@sougou Updated the PR per your comment. Please take a look. Thanks!

Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this feel through the cracks. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants